Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not swap with projection when file is partitioned #14956

Merged
merged 3 commits into from
Mar 2, 2025

Conversation

blaginin
Copy link
Contributor

@blaginin blaginin commented Mar 1, 2025

Which issue does this PR close?

Rationale for this change

I feel like there should be a way to apply swapping even if the file is partitioned - but submitting a hotfix since it's a release blocker

What changes are included in this PR?

Are these changes tested?

Added a test

Are there any user-facing changes?

No

@blaginin
Copy link
Contributor Author

blaginin commented Mar 1, 2025

uv run pytest 
================================================== 472 passed, 4 skipped, 47 deselected, 90 warnings in 23.16s ===================================================

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this @blaginin

It would also be great to find a reproducer somehow. I don't have any more time this morning to help but I can try to find some later today or tomorrow morning

Ok(all_alias_free_columns(projection.expr()).then(|| {

Ok((all_alias_free_columns(projection.expr())
&& self.table_partition_cols.is_empty())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like the check might be if any of the columns needed are in the table_partition_cols (rather than there just being partition columns at all) 🤔 Or something like that

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great point 🤗

@blaginin blaginin force-pushed the bugfix/do-not-swap-proj-for-paritions branch from 88695a2 to 55f8b30 Compare March 1, 2025 21:37
@github-actions github-actions bot added the core Core DataFusion crate label Mar 1, 2025
@blaginin blaginin marked this pull request as ready for review March 1, 2025 23:13
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @blaginin -- this is great. I was worried that this didn't handle the case when the pushed projection had an expression, but I wrote a test (I will make a follow on PR) and it seems to work

Nice job

@alamb alamb merged commit 5e27008 into apache:main Mar 2, 2025
25 checks passed
@alamb
Copy link
Contributor

alamb commented Mar 2, 2025

I made a small follow on:

alamb pushed a commit to alamb/datafusion that referenced this pull request Mar 2, 2025
* Do not swap with projection when file is partitioned

* Narrow the case when not swapping

* Add test
alamb added a commit that referenced this pull request Mar 2, 2025
* Do not swap with projection when file is partitioned

* Narrow the case when not swapping

* Add test

Co-authored-by: Dmitrii Blaginin <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

index out of bounds: the len is 2 but the index is 2 in some data sources
2 participants